add structured error types and retry logic for 429/5xx#7
Conversation
Greptile SummaryThis PR adds a typed error hierarchy (
Confidence Score: 5/5Safe to merge — the retry and error-dispatch logic is correct, idempotency guards work as described, and The core changes are well-structured: the idempotency guard correctly prevents POST/PATCH 5xx retries, No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[HTTP request via _request / post_to_stream_host] --> B{status < 400?}
B -- yes --> C[Return response]
B -- no --> D[to_api_error typed subclass]
D --> E{_should_retry?}
E -- no: 4xx non-429, or 5xx on POST/PATCH --> F[raise exc]
E -- yes: 429 or 5xx on idempotent method --> G{attempt < max_retries?}
G -- no --> F
G -- yes --> H[_parse_retry_after]
H --> I{Retry-After header?}
I -- seconds string --> J[use parsed float]
I -- HTTP-date string --> K[compute delta from now]
I -- absent --> L[exponential backoff with full jitter]
J & K & L --> M[sleep / asyncio.sleep]
M --> A
Reviews (2): Last reviewed commit: "address review: idempotent-only 5xx retr..." | Re-trigger Greptile |
| for attempt in range(self._max_retries + 1): | ||
| response = self._client.request(method, path, headers=headers, **kwargs) | ||
| if response.status_code < 400: | ||
| return response.json() if response.content else None | ||
| exc = to_api_error(response) | ||
| if attempt < self._max_retries and _should_retry(exc): | ||
| delay = _retry_delay(attempt, _parse_retry_after(response)) | ||
| logger.debug("retrying %s %s (attempt %d, delay %.2fs)", method, path, attempt + 1, delay) | ||
| time.sleep(delay) | ||
| continue | ||
| raise exc |
There was a problem hiding this comment.
Non-idempotent POST/PATCH retried on 5xx
_request is used by post() and patch() as well as get() and delete(), so the retry loop applies to all of them equally. A 5xx can be returned by a proxy or load balancer after the upstream already committed the write — retrying will then execute the mutation a second time. For example, client.agents.create(...) hitting a 502 from a gateway would create the agent twice before the caller sees any error. The retry should be limited to idempotent methods (GET, DELETE, HEAD) or guarded by a method check.
| def _parse_retry_after(response: httpx.Response) -> float | None: | ||
| raw = response.headers.get("retry-after") | ||
| if raw is None: | ||
| return None | ||
| try: | ||
| return float(raw) | ||
| except ValueError: | ||
| return None |
There was a problem hiding this comment.
Retry-After HTTP-date format is silently dropped
RFC 7231 allows Retry-After to be either a delay-in-seconds (Retry-After: 30) or an HTTP-date (Retry-After: Wed, 21 Oct 2015 07:28:00 GMT). When the header contains a date string, float(raw) raises ValueError, the function returns None, and the code falls back to exponential backoff — which for attempt=0 is at most 0.5 s. If the server sent a date meaning "wait 30 s", the client will retry far too soon, burn through all retries, and still surface a RateLimitError to the caller while likely worsening the rate-limit situation on the server side.
| @@ -184,7 +227,7 @@ def stream_typed( | |||
| yield response_model.model_validate_json(event.data) | |||
| except ValidationError: | |||
| # Server-side mid-stream error frames or schema-drift events | |||
| # land here. | |||
| # land here. | |||
| logger.debug("dropped non-%s SSE: %s", response_model.__name__, event.data) | |||
| continue | |||
|
|
|||
There was a problem hiding this comment.
stream and stream_typed have no retry logic
Both HttpClient.stream / HttpClient.stream_typed (and their async counterparts) still use the original single-attempt pattern. A 429 or 5xx on a streaming request will raise immediately, while the same status on a non-streaming request through _request or post_to_stream_host will be retried up to max_retries times. Callers who expect consistent retry behaviour regardless of whether they use the streaming or non-streaming variant will be surprised by this inconsistency.
| @respx.mock | ||
| def test_retries_on_500_then_succeeds(client: Gumloop) -> None: | ||
| route = respx.get(f"{API_BASE}/agents").mock( | ||
| side_effect=[ | ||
| httpx.Response(500, json={"error": {"message": "internal error"}}), | ||
| httpx.Response(200, json={"agents": []}), | ||
| ] | ||
| ) | ||
|
|
||
| result = client.agents.list() | ||
|
|
||
| assert result.agents == [] | ||
| assert route.call_count == 2 |
There was a problem hiding this comment.
Test performs a real
time.sleep during retry
test_retries_on_500_then_succeeds does not patch time.sleep, so the first retry will actually block for random.uniform(0, 0.5) seconds. While jitter can make this 0, it is non-deterministic and will occasionally slow down the test suite. All other retry tests that care about timing explicitly patch time.sleep — this one should too for consistency.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
…leep patch in tests
What
APIStatusErrorsubclasses dispatched by status code:BadRequestError(400),PermissionDeniedError(403),NotFoundError(404),UnprocessableEntityError(422),RateLimitError(429),ServerError(5xx)HttpClientandAsyncHttpClientmax_retriesonGumloopandAsyncGumloopconstructors (default: 2)Why
Right now everything non-2xx raises a generic
APIStatusError, so callershave to inspect
.status_codemanually to know what went wrong. Specificsubclasses make error handling a lot cleaner — you can just catch
RateLimitErrororNotFoundErrordirectly.The retry logic handles transient 429s and 5xx failures automatically.
Only server-side errors are retried, never 4xx. Respects
Retry-Afterwhen present, otherwise falls back to exponential backoff with jitter.
max_retries=0disables it entirely if needed.Tests
18 new tests in
tests/sdk/test_errors.py.